ITALICA at PAN 2013: An Ensemble Learning Approach to Author Profiling Notebook for PAN at CLEF 2013
نویسندگان
چکیده
This notebook discusses the approach to the Author Profiling task developed by the Italica group for PAN 2013. This system implements two different sets of classifiers which are combined later in order to build a final classifier that takes into account the decisions of the previous ones. The initial classifiers are focused on vector space representations of the documents as a bag of words and n-grams of POS tags and also on a set of stylistic features of the texts. The final classifier consists of a stacking schema that combines the other ones. This approach has obtained better results for the Spanish dataset than for the English dataset, probably due to the use of more detailed POS tagset in the former.
منابع مشابه
Ensemble-based Classification for Author Profiling Using Various Features Notebook for PAN at CLEF 2013
This paper summarize our approach to author profiling task – a part of evaluation lab PAN’13. We have used ensemble-based classification on large features set. All the features are roughly described and experimental section provides evaluation of different methods and classification approaches.
متن کاملStyle-based Distance Features for Author Verification Notebook for PAN at CLEF 2013
In this paper we present the approach we took in our participation to the PAN 2013 Author Profiling task. It is an adaptation of our system submitted for author identification, assuming that a profile category (authors belonging to the same gender and age group categories) can be analyzed in the same way as an author’s style.
متن کاملReadability for Author Profiling? Notebook for PAN at CLEF 2013
This paper briefly describes the approach taken to the Author Profiling task at PAN 13. It describes the simple features used, and the origins in thinking around text readability as a mechanism for identification, and the predictive model used which may have beneficially omitted classes, as well as offering commentary on the results obtained.
متن کاملStyle-based Distance Features for Author Profiling Notebook for PAN at CLEF 2013
In this paper we present the approach we took in our participation to the PAN 2013 Author Identification task. It relies on a complex process to select the features which represent the author’s writing, using potentially multiple statistics and distance measures computed from the training set.
متن کاملAuthor Profiling: Predicting Age and Gender from Blogs Notebook for PAN at CLEF 2013
Author profiling is the task of determining age, gender, native language or personality type of author by studying their sociolect aspect, that is, how language is shared by people. In this paper, we propose a Machine Learning approach to determine unknown author’s age and gender. The approach uses three types of features: content based, style based and topic based. We were able to achieve an a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013